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otherwise the far path needs to be used. When the input floating point data elements 
require at most a 1-bit alignment, it is possible that when performing an unlike-signed 
addition (i.e. equivalent to subtracting one data element fiom the other) massive 
cancellation may occur, and to enable the resultant floating point value to be correctly 
aligned, it is then necessary to provide normalisation logic within the near path. Such 
logic is not required in the far path. However, in the far path, it is necessary to provide 
rounding logic due to the fact that the data elements may need more than a 1-bit 
alignment. Such rounding logic is not required in the near path. 

Accordingly, by providing a near path and a far path, the length of each path can 
be made shorter than would be the case if a single unitary path were provided for 
performing the data processing operation, and this can hence produce an increase in 
processing speed. For example, considering the earUer pipelined processing logic 
example, the pipeline depth can be reduced by using a near path and a fer path, which can 
give rise to increase in processing speed when compared with a unitary processing path. 
However, one problem that arises when providing more than one processing path for 
performing the data processing operation is in determining whether the alignment 
condition required for using any particular path does in fact exist. 

In accordance with the technique discussed in the above-mentioned paper fiom 
the 15th ffiEE Symposium on Computer Arithmetic, prediction logic is used to predict 
whether the alignment condition for the near path exists, which can make an early 
prediction as to whether the alignment condition for the near path appears to exist. 
However, predicted results by their very nature will not necessarily be true, and 
accordingly it is necessary to perform the processing in both the near path and the far 
path until such time as the presence of the alignment condition can actually be 
determined. Hence, whilst the predicted result can be used to perfom some initial 
processing, for example shifting, in the near path, it is not until the actual alignment 
condition is positively determined that the result fix>m any particular path can be used. 
Hence, such an approach is not very power efiScient. since the data processing operation 
needs to be performed in both processing paths. Further, this has some impact on the 
area required for the processing logic, since further logic is needed in addition to the 
prediction logic to perform the actual detection of the aligmnent condition at a later stage 
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in the processing path, and to manage the computations being performed within several 
different processing paths. 

A further problem is that because the prediction may be wrong, any assumptions 
made in an early part of the near processing path based on that prediction cannot be used 
in the far processing path, since if the prediction proves wrong it will be necessary to rely 
on the processing performed in the far processing path in order to generate the correct 
result. Accordingly, there is no opportunity to share logic between the near and far 
processing paths, which again leads to an implementation which is inefiBcient in terms of 
size of the processing logic, and in terms of power consumption. 

It is an object of the present invention to provide an improved technique for 
determining a processing path to be used to perform a data processing operation on input 
data elements. 

SUMMARY OF THE INVENTION 
Viewed fiiom a first aspect, the present invention provides a data processing 
apparatus for performing a data processing operation on first and second floating point 
data elements, the first floating point data element specifying a first exponent and the 
second floating point data element specifying a second exponent, the data processing 
apparatus comprising: processing logic providing multiple processing paths which are 
selectable to perform the data processing operation, including a first processing path 
operable to perform the data processing operation if a predetermined alignment condition 
exists; at least one detector logic unit operable to receive both said first exponent and 
said second exponent, and to detect the presence of said predetemiined alignment 
condition, each detector logic unit comprising: half adder logic operable to perform a 
number of half adder operations to logically subtract one of the first and second 
exponents Scorn the other of the first and second exponents to produce at least a sum data 
value of sum and cany data values representing the result of the number of half adder 
operations; and generation logic operable to receive the sum data value and to generate a 
select signal which is set if the sum data value has a predetemiined value indicating the 
existence of said predetermined alignment condition; the processing logic being operable 
to select the first data processing path to perform the data processing operation if the 
select signal fi-om one of said at least one detector logic units is set. 
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In accordance with the present invention, a detector logic unit is provided which 
can detect the presence of the predetermined alignment condition required for using a 
first processing path, and which can be used instead of the prediction logic used in the 
prior art. The detector logic unit of the present invention employs half adder logic to 
perform a number of half adder operations to logically subtract one of the first and 
second exponents &om the other of the first and second exponents. A half adder 
operation typically produces a cany data value and a sum data value. In particular, if it is 
assumed that the first and second floating point data elements are X and Y where X = x„. 
I . . . xixo, and Y = yn-i . . . yiyo are n-bit words with low order bits Xo and yo, an n-bit half 
adder produces a carry word C = c^., . . . c,0 and a sum word S = s„.i . . .sjso such that 
canyci = Xi.i AlvJDyi., (1) 
sum Si = Xi XOR yj (2) 

By adding C and S together, it will be possible to determine whether the 
predetermined alignment condition exists, but in practice there would be insufficient time 
to perform that addition early enough to enable the output to be used at an early stage to 
select the required processing path to perform the data processing operation. Only if the 
detection of the alignment condition can be determined at an early stage can significant 
savings in power and area be achieved relative to the earlier described prior art 
techniques. 

The inventors of the present invention realised that the properties of the half- 
adder form (C,S) dictated by the above equations (1) and (2) mean that it is possible, 
once a number of half adder operations have been performed, to determine the presence 
of the predetermined alignment condition &om the sum data value alone. Accordingly^ 
the detector logic unit of the present invention airanges the half adder logic to produce at 
least a sum data value of the sum and carry data values representing the result of the 
number of half adder operations, and generation logic is then provided to receive the sum 
data value and to generate a select signal which is set if the sum data value has a 
predetermined value indicating the existence of the predetermined aHgnment condition, 
this means that there is no requirement to add together the cany data value C and the 
sum data value S, and this enables the existence of the predetennined aligmnent 
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condition to be detected significantly more quickly than was previously thought possible. 

Accordingly, through use of the present invention it is possible for the processing 
logic to be operable to select the first data processing path to perform the data processing 
operation if the select signal firom one of the at least one detector logic units is set. 
5 Because of the speed with which the detector logic unit of the present invention detects 
the presence of the predetermined alignment condition, this selection can be performed at 
an early stage in the processing path, and hence allow significant savings to be made in 
terms of both power and area. 

The number of half adder operations performed by the half adder logic will vary 

10 dependent on the predetermined alignment condition to be detected. However, in one 
embodiment, the nxmiber of half adder operations performed by the half adder logic is a 
plurality of half adder operations. 

In one embodiment, if the select signal from one of said at least one detector logic 
units is set, the processing logic is operable to prevent performance of the data processing 

15 operation in the processing paths other than the first processing path. This is possible 
due to the fact that the detector logic unit is able to detect the presence of the 
predetermined alignment condition significantly more quickly than known detectors, and 
in particular early enough to enable the processing paths other than the first processing 
path to be tumed off to conserve power. It will be ^predated that there are a number of 

20 ways to prevent performance of the data processing operation in the processing paths 
other than the first processing path. In one particular embodiment, this is done by routing 
the select signal to logic which generates enable signals for the various components in 
the processing paths, with this logic then being arranged to disable the logic elements in 
the processing paths other than the first processing path upon receipt of a set select 

25 signal. 

It will be appreciated that the predetermined alignment condition may take a 
variety of forms dependent on, for example, the data processing operation being 
performed. However, in one embodiment, the predetermined alignment condition 
specifies that the first and second floating point data elements require at most a one-bit 
30 alignment, and the at least one detector logic unit is operable to detect whether the first 
and second exponents differ by one by determining whether the sum value has a value of 
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-2, if the sum value has a value of -2 the at least one detector logic unit being operable to 
generate a shift signal in addition to the select signal. If the first and second exponents 
differ by one, then it is appropriate to shift the mantissa of one of the data elements so 
that they are aligned prior to performing the data processing operation. Accordingly, this 
5 shift signal can be routed to the logic within the first processing path used to perform 
such a shift. 

In such an embodiment, it will be appreciated that the select signal will be set if 
the first and second exponents differ by one. However, in addition, the select signal 
should still be set if the first and second floating point data elements are actually aligned. 

10 A separate zero-alignment detector can be used to make that detection, with the select 
signal then being produced if either the zero-alignment detector detects a zero-alignment, 
or the at least one detector logic unit detects a sum value of -2 (i.e. detects that the first 
and second exponents differ by one). 

However, in one embodiment of the present invention the at least one detector 

15 logic unit is operable to detect whether the first and second exponents are equal or differ 
by one by determining whether the sum value has a value of -1 or -2, if the sum value has 
a value of -1 or -2 the at least one detector logic unit being operable to generate the select 
signal. Accordingly, in such an embodiment, the at least one detector logic unit detects 
when the first and second exponents differ by one and also detects if a first and second 

20 exponents are equal, if they are equal this being indicated by the sum value having a 
value of -1. 

It will be appreciated that the half adder logic within each of the at least one 
detector logic units can take a variety of forms. However, in one embodiment, both the 
first and second exponents have n bits, and the half adder logic comprises: first n-bit half 

25 adder logic operable to perform a first half adder operation to logically subtract said one 
exponent fi-om said other exponent to produce an intermediate sum value and an 
intemiediate cany value; and additional logic operable to perform at least a partial 
second half adder operation to logically add the intermediate sirni value and intermediate 
cany value to generate said sum value. It is possible for second n-bit half adder logic to 

30 be used instead of the additional logic referenced above, but since the generation logic 
only needs to use the sum data value in order to determine whether to generate a select 
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signal, then the additional losic can he <■ 

nmvirf. « . 8"^ mstead of a second n-bit half adder logic to 

Pn>v.de a more efficcnt implementation of the half adder logic 

^ I. Will he appreciated .hat flte additional logic may be constntcted in a variety of 

5 rrerff r *^ ^-^^ XOR logic opelle 

^perfonn an XOR opeoUon on conesponding bits of the intenrtediate snm vaL and 
the mtermediate cany value other than the least significant bit 

Re processing logic may take a variety for fenna. However. i„ one embodiment 
2 P-essmg logtc has a plurality of pipeline stages, and each ptoce^ing path 
ompnses mn„ple pipeline stages, the at least one debtor logic m.t beiug „pl, to 

.0 ^--^eshiaaignalforinputtoafirstpipelinestageofthefirstprocess^^^^^^^^ 
^Ptpehne stage cmttaining shift ,o^c. and the shift si^ beit^g „.ed to Intro ^ 
~ of the shtft logic. Hence. «,e at leas, one detector lo^c is able to genetatc the 
2 -n <^ i. .0 be input to a ftt.t pipeline stage of the fi,.t processing path, 
.h. enabhng any necessaty shift of one of the data Cements to take pUce Z I 
.5 P^o^ng the data ptocessing operation. P«her. since this shift signal I ,J J ^ 
actual detecon of «,e presence of the predetemtined ali^em condition, rather Z 
a predic^on of it, this mea. that the shift is guatanteed to be cott^t, an^ 
a^ordmgly no te«,er logic is re,„i,^ later in the processing paths to account for a 
manon ,„ whrch an incotrect shift is made (as would be the case if the shift was based 
20 on a prediction). 

'""-P-ti-'-^bodiment.theflrstfloatingpointdataelementspeciiiesate 

— and the second Soaring point data element specifies a second mantiss. and the 
^« .ogtc oompnses Sr. shift logic p,ovid«, to sel^Hvely perfom, a shift opet^on on 

on the second manrissa, the a, least one detector logic unit comprising a first detector 
^c un.t associated with the fl.t shift lo^c and a s«o„d detector logic unit associated 
wtth fte second shift logic, the half adder logic of me first dete«or logic unit being 
operable to logtcally subtract the fu., exponent ftom the second exponent and the Jf 

^ adder log.c of the second detector logic unit being operable to logically suhttact the 

JO second exponent from the first exponent. 
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B wl, be appreciated fta. tt,e ™„„ip,e p„cessi„g paU,s ™y i„con,o,a.e en,Wy 
«p3«e log.c. However, since fte shifting perfom,ed wimin fte f,., pipeU„e s«ge of 
me firs, processing patt, i, based on an exac. toenninadon of fte presence of .he 
pr^^ align^en, condition (rafter ,han J« a predic«on). i. can be guaranteed 
to. any shrft perfomred is appr„pria.e. and accordingly in one embodinren, fte fim 
P.pehne s.age is co^n^on ,o Ure nrultipie processing pafts. Ms enables a redncdon in 

area of , he processing logic and can also give rise to a reductton in power consumption 

" -PP^a-^f that ftedara processing oper^ion can teke a vade.y of 

ftnns. However.inoneembodintentmedaup,oces«ng operattonisanunlike-si^ed 
.0 addrdon oper^ioa Tlre firs, processing pam can be arranged to allow such an unlilce- 
sr^cd addidon operation to be perftnned in a particularly efficien. manner in sidradons 
where flie predetennined alignment condition is detennined to exist. 

Viewed ftom a second aspect, U,e present invention provides a mefliod of 
detennmmg a processing pad, of a dara processing appa,^ to perfonn a da« 
l-c«sing opemtion on firs, and second doating point data element, dre dr.. floating 
P0,n, dad. elenre., specifying a firs, expon^,, and dre second floating poin. da.a elemen. 
specfytng a second exponen,, d,e da,a processing apparadrs having processing logic 
provtdmg multiple processing padrs which are sele«able to perform dre dad. processing 
opc^ion, including a first processing pad, operable to perfonn dre da,a processing 
ope^tion if a predetennined alignmen. condition exisu, flre medtod comprising dre sreps 
Of: (a) providing a. leas, one dete«or logic unit which receive bod, said firs, exponen. 
and sard second exponen, and widnn each de.ector logic unit detecting dre presence of 

satdp^eterndnedahgnmenrcondition by p„formingd,es,^sof:(a)(i, employing h^ 
add« logic to perfonn a number of half adder operations to logically subdac. one of dte 
a* and second exponen.s from dre odrer of the frrs. and second exponen.s to pr«iuc a, 
1^ a sum dara value of sum »d cany da.a values representing dre resul. of d,e munber 
of half adder ope«,ions; and (a,(ii) generating a selec, si^ral which is s« if dre sum da,a 
value has a predelermined value indicating dre existence of said p.ede,ermi„ed alignmen. 
condi,ion; (b) selecting toe firs. da,a processing pad, to perform dre dad. processing 
operation if dre selec, sipud fiom one of said a, leas, one dcector logic um.s is se. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
The present invention will be described further, by way of example only, with 
reference to preferred embodiments thereof as illustrated in the accompanying drawings, 
in which: 

5 Figure 1 is a block diagram illustrating logic provided within a near processing 

path of a data processing apparatus providing a near path and a far path for performing a 
data processing operation on first and second floating point data elements; 

Figure 2 is a diagram illustrating logic provided within the difference equals 
one/zero detector of Figure 1 ; 
1 0 Figure 3 is a diagram illustrating the construction of a one-bit half adder provided 

within the 8-bit half adder of Figure 2; 

Figure 4 illustrates an alternative embodiment of the difference equals one/zero 
detector of Figure 1; 

Figure 5 is a flow diagram illustrating the processing performed within each 
15 difference equals one/zero detector of Figure 1; 

Figure 6 is a block diagram of logic provided within a data processing apparatus 
to compute an absolute difference between first and second integer data elements in 
accordance with one embodiment; 

Figure 7 is a block diagram illustrating a prior art end around carry adder; and 
20 Figure 8 is a flow diagram illustrating the processing steps performed in one 

embodiment to calculate an absolute difference between first and second data elements. 
DESCRIPTION OF PREFERRED EMBODIMENTS 
A data processing apparatus may be arranged to perform a data processing 
operation on various types of data element. One type of data element which may be 
subjected to such data processing operations is the floating point data element. A 
floating point number can be expressed as follows: 
=bl.x*2y 

where: x = fiction 

1 .X = significand (also known as the mantissa) 
y = exponent 
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A data proc^sing appaMu. ana„g«, ,o perfo™ ,erW„ data p^cessing 
opera^ons on firs, and second floadng point data elements n,ay p„vide both a ne„ 

„e «nbod.n,e„., the „ea, pressing pa^ can be used to perfo,™ «nli..signed 
add,.,o„ ope„.,o„s on the first and second floating point data elements. Further d« 
near paft can he us«, if the flt^t and second potat floating data elements re,ui. at 
n,os, a 1-b.t alignnren, whereas otherwise the fi. path needs to be used. When fl,e 
.nput floadng point data elements re,uire at most a ,-hit aHgnment. it is possible that 
when p^orming an unlike-signed addition (i.e. equivalent to subtracting one data 
element fiom the otiter) massive cancellation may occur, and to enable the resultant 
floatmg point value to be correctly aliped, it is then necessa., „ 
nonnal.sat.on logic within the near path. Such logic is not required in the far path 
However in the ftr path it is necessaty to provide rounding logic due to the tact that 
a,e data elements may need more than a 1-hit aliment. Such rounding logic is not 
15 requued in the near path. 

Figure I is a block diagram illustrating logic provided within lb. near 
p.-«s.„g path of a data processing system to perform the necessary processing „„ ^ 
s.gn„^cand portions of tirst and second floating pomt data elements when perfommrg 
an uuhlce-srgned addition in accordance with one embodiment, h, particular, it can be 

seen from Figure 1 that the illust^ed near pam logic is contamed wiflrin four pipeline 
^ Nl to N4. T„e fl« fl„a,h,g point data element A is stored in the register ,0 
wh.ls. the sec^rd floating pomt data element B is sto.«i in the register 20. It will be 
appn^ciated that *e floatmg point values A and B may be single precision floatmg 
pom. values or double precision floating point values. However, in the example 
.llustrated m Figure 1, it is assumed that both input data elements ar. single precision 
floanng poin, ,,,„es. Such single precision values a.. 32.bit v^ues, with the most 
s,9..Hcant bit specitying a sign value, the next 8 bits specifying an exponent value, and 
the final 23 bits specifying a ftaction value. 

30 ''""'*'""""^''™'^"''"2^-''"='«°i«'»<'-"'-oo„stn.c.ed,andthe 
24-b.t srgmflcand of the flr^t daU element is n.„.ed to associated shift logic 35 during 
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e.en,«„ B .s routed ,o associated shift logic 45 during U,e firs, pipeline stage 

^uted toll'"!? '"'""^ ■^'^ - 

5 It d r / '^"'^ *^ ^° ^^-^ wi* shift logic 35 

and U.e detector 40 being associated with Shift logie 45. AS mentioned earlier. fLe 
near path to he used, there needs u, he a. n,ost a 1 -hit alignme. het^een the fitst ^ 

d tec.^1 hy the detectors 30. 40 by a contparison of the exponents of both input data 

30. 40 seeks to detect this alignnrent condition by perfonning , „^,, 

opera-ons in a „«„ner such that the detection of d,e alignment condition can he 

detemuned flom analysis of the sum data value alone. 

Before discussing the logic of Figure 2, the following background is provided 
cone^ng the operatic: of an n-bit half adder and the manner in which the half adder 
open^ons can be perfonned to enable solely the analysis of the resultant sum value to 
provide an indication of the presence of the aligmnent condition. 

An n-bithalfadder consists ofn independent half adder,. I. takes two n-bi, ^vo's 
complement numbers as inputs, and produces «vo outputs: an n-bi. sum and an n-bit 

carry. ^ *e presort context, dte exponent values a,, unsigned values that can be treated 
2U as two s complement numbera. Let X = x„ , x.x ar,H v 

■ ''"-'•••'''^0'^dY = y„., ...y.yoben-bitwords 

w.thloworderb.tsxoandyo. Anri-bithalfadderproducesacanywordC = c„, cO 
andasumwordS=Sn., ... s,so such that: 

Ci = Xi.,ANDyi., (1) 
s, = Xj XOR yf (2) 

25 Note that cois always 0, and that C + S = X + Y(moduIo 2") 

^ By definition, (C,S) is in n-bit half-adder fonn (also referred to as n-bit h-a fonn) 
the. ex.st n-bit X and Y satisfying equations 1 and 2. We write (C,S) = ha(X,Y). and 
themodafierWcanbeomittedunlessitisnecessaxyforclarity. 

30 T7\'^^^"''^^^^""-^-^"^-^--^enitcanbeprovedthatthesi.^^^^ 
30 where S = -l means that C + S = -l. 

Proof 
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[=>] (C,S) is in h-a form, so there exist X and Y such that X + Y = -1 and (C,S) = 
ha(X,Y). By the definition of a two's complement number, X + Y = -1 means that Y 
= It . Then by equation 2, S = X XOR ^ = -1. 

[<=] By the definition of h-a from (see equations (1) and (2) above), only one of Ci and 
5 Si-i can be set for i = 1, n-1, so C = 0, and C + S = -1. 

The above Theorem 1 was discussed in the paper "Early Zero Detection" by D 
Lutz et al. Proceedings of the 1996 International Conference on Computer Design, 
pages 545 to 550. However, it was only discussed in the context of integer arithmetic. 

By the above Theorem 1, it can be seen that if it is desired to determine 
10 whether two numbers are equal, then a half adder operation can be performed using the 
two numbers as inputs, and if the sum value has a value of -1, this will indicate that the 
carry value is zero, and that the two numbers are hence equal. However, in the current 
context of the detectors 30, 40 in Figure 1, a key requirement is to detect whether the 
two input exponents differ by one. In other words, if the two exponents are considered 
15 to be X and Y, then the detector logic 30, 40 needs to determine whether X-Y = 1 or 
Y-X = 1. The following discussion will illustrate why, through the use of two half 
adder operations (or their equivalent), such an alignment condition can still be detected 
merely by reviewing the value of the sum data value produced. 

Lemma 1 Given two n-bit numbers X and Y, then it can be shown that the 

20 equation X - Y = 1 is equivalent to the equation Y + A' = -2. 
Proof: 

X-Y=1<»Y-X = -1 

oY + X +1=-1 (by definition of two's complement numbers). 
<=>Y+ IC =-2 

25 

The earlier theorem 1 provides an easy test for comparing sums with -1, but we need a 
test for comparing sums with -2. 
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Theorem 2 Let (C,S) be a number in h-a form, and suppose Ci = 0. Then it can be 
proved that the situation where S = -2 means that C + S = -2. 

Proof: [=>] Recall that -2 is represented in two's complement numbers as a word 
consisting of all ones except for a zero in the low order bit. ci = co = 0, so we cannot 
5 have C + S = -2 unless Si = 1 and So = 0. Now si = 1 => C2 = 0. Again, we cannot have C 
+ S = -2 unless S2 =1. A simple induction completes this half of the proof 
[c=] S = -2 => Si = 1 for i = 1,2,. . .,n-l. By the definition of h-a form, only one of c\ and 
Sj-i can be set for i =1 ,. . n-1, so Cj = 0 for i = 2,3,. . .,n-l . By assumption, Ci = Co = 0. 
Therefore, C = 0, and C + S = -2. 
10 The critical step in the proof above relies on the fact that the AND and XOR of two bits 
cannot both be true. 

Note that ci = 0 can be guaranteed by using two levels of half adders. 

Figure 2 illustrates logic provided within each detector 30, 40 that uses the 
concepts set out in the above proof to detect a condition where the sum value equals -2, 

15 thereby indicating the presence of the required alignment condition, i.e. that the two 
exponents differ by one. In particular, one of the exponents X is latched in register 200 
while the other exponent Y is latched in register 205, The detector 30 is used to evaluate 
whether subtraction of the exponent of A fi^om the exponent of B gives a result of one, 
whilst the detector logic 40 is used to evaluate whether subtracting the exponent of B 

20 fi-om the exponent of A gives a result of one. Accordingly, with reference to Figure 2, 
for the detector 30, the exponent of B is placed in register 200, and the exponent of A is 
placed in register 205, whilst for detector logic 40, the exponent of A is placed in register 
200 and the exponent of B is placed in register 205. 

The inverter 210 inverts the exponent value X stored in register 200 prior to input 

25 to the 8-bit half adder, and the 8-bit half adder 215 is then arranged to perform the above 
equations 1 and 2 on each pair of bits received fi*om registers 200, 205. In particular, as 
illustrated schematically in Figure 3, within the 8-bit half adder 215 is provided eight 1- 
bit half adders 275, each 1-bit half adder 275 including an exclusive OR gate 280 and an 
AND gate 285. Given the earlier equations 1 and 2, it can be seen that the output fi'om 

30 AND gate 285 is the carry value c'j+i and the output fi-om XOR gate 280 is the sum value 
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s'j. The apostrophe after the c and s values is intended to indicate that these carry values 
and sum values are intermediate carry and sum values. 

Given the earlier discussed Theorem 2, a second level of half adder is required to 
perform a second half adder operation before the resultant sum value can be assessed to 
5 determine whether that sum value is -2. However, since in this implementation there is 
no interest in the resultant carry value, then the second half adder operation only needs to 
perform a partial half adder operation in order to generate the resultant sum data value, 
and accordingly instead of a second 8-bit half adder, the sequence of XOR gates 220, 
225, 230, 235, 240, 245 and 250 can be used. These will implement the earlier 

10 mentioned equation 2 for i = 1 to 7. Since by definition of the half adder form the bit 
zero of the intermediate carry value will be zero, this means by virtue of the earlier 
equation 2 that bit zero of the final sum value must be the same as bit zero of the 
intermediate sum value. As discussed earlier, if the final sum value is to be -2, this will 
require that all bits of the final sum value other than the least significant bit are 1, and 

15 that the least significant bit is 0. Accordingly, inverter 255 is used to invert bit zero of 
the intermediate sum value (equivalent to bit zero of the final sum value), as a result of 
which AND gate 260 will only output a logic one value if the final sum value is -2. 

This output fi-om AND gate 260 is used as a shift signal to input to the associated 
shift logic 35, 45. Accordingly, if a logic one shift signal is produced by the detector 30, 

20 this will cause the shift logic 35 to shift the significand of the data element A right by 1 
bit, whilst if altematively a logic one shift signal is produced by the detector 40, this will 
cause the shift logic 45 to shift the significand of data element B right by 1 bit position. 

Li addition to determining the shift signal, it is required that the detectors 30, 40 
also detect whether it is appropriate to use the near processing path instead of the far 

25 processing path. This v^U be the case if either shift signal fi-om the detectors 30, 40 is 
set, but will also be the case if in fact the exponents are equal Accordingly, the detectors 
30, 40 will typically also include logic for detecting whether the two exponents are equal, 
and it will be appreciated by those skilled in the art that such logic can be implemented in 
a variety of ways. With reference to Figure 2, the output from such "difference equals 

30 zero" detect logic will be routed to OR gate 270 along with the shift signal output by 
AND gate 260, with the output of the OR gate 270 providing the select signal. 
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Accordingly, fte select signal ™1. be s« if eifterft. shift signal is se, or .he oufl-u, 6on, 
a drffenmce equals zero detector is se,. This select signal, which is produced in pipeUne 
aage Nl, can be sent to enable logic to cause a,a, enable logic to then disable the logic in 
the far processing path in the event that the select signal is set. 
5 Hence, this early gcnemtion of a select signal enables the fir processing path to 

be turned offat an early stage in the event that the aUgnntent conditio required for using 

the near path is detected, thus aUowing significant pow« savings to be achieved 1, 
addmon. since the d«ectors 30, 40 detect the actual presence of the required alignment 
condttton for using the near paflr. rather than merely maldng a prediction about the 
10 presence of that alignment condition, it can be guaranteed that any shifts perfomred by 
the shtft logrc 35, 45 are correct. TOs means that the logic in pipeline stage Nl can be 
shared by both the near pafl, logic and the fir path logic, providing savings m temrs of 
area and power. 

Figure 4 is a block diagram illustrating an alternative configuration of the 
15 detector logic 30 or 40, in which the detection of both exponents being equal is 
perfonned directly by the logic that is being used to detect whether the exponents differ 
by one. As can be seen by comparison of Figure 4 with Figure 2. the 8-bit AND gate 260 
of Rgure 2 is replaced by a 7-bit input AND gate 300. Given the earlier discussions it 
wall be appreciated that if the exponents are equal, the sum value output by the detecior 
20 logxc will have a value of -1 (i.e. lliuni), whilst if the exponents differ by one the 

sumvaluewillhaveavaIueof-2(i.e. 11111110). Accordingly, if all bits other than the 
least significant bits are set to a logic one value, this will directly indicate that the select 
signal should be set. since this will confirm that the exponents either differ by one or are 
equal. The condition that the exponents differ by one can then be captured by routing the 

25 -tputfi.mANDgate300totheinputofANDgate310,whichalsor^eivestheoutput 
fiom mverter 255. Hence, it can be seen that the logic of Figure 4 will produce both the 
shift signal and the select signal, which can then be used in the manner described earlier 
with reference to Figure 2. 

Figure 5 is a flow diagram schematically illustrating the prx)cessing performed 
30 wrthm each detector logic unit 30. 40. At step 400. one of the exponents is inverted and 
added to the other exponent in half adder logic in order to produce an intermediate carr, 
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value and an intermediate sum value. Then, at step 410, the computation Si = c'i XOR s'j 
is perfomied for all bits of the intermediate carry and sum values other than the least 
significant bit Meanwhile, at step 420, the least significant bit of the intermediate sum 
value is inverted. By reference to Figures 2 and 4, it will be appreciated that in one 
5 embodiment steps 410 and 420 are performed in parallel. 

Then, at step 430, it is determined whether the sum value produced is equal to -2, 
and if so the process proceeds to step 440, where the shift signal is set and the select 
signal is set. In Figure 2, this occurs via the outputs from AND gate 260 and OR gate 
270, whilst in Figure 4 this occurs via the outputs from AND gates 300 and 310. 
10 If at step 430 it is determined that the sum value does not equal -2, then the shift 

signal is not set at step 450. The process then proceeds to step 460, where the select 
signal is then only set by the detector logic unit if the first and second exponents are 
detected to be equal. 

It will be appreciated that the process of Figure 5 is performed independently 
15 within each detector logic unit 30, 40, with the detector logic 40 inverting the first 
exponent, whilst the detector 30 inverts the second exponent. 

The logic of Figures 2 or 4 provides a particularly efficient technique for allowing 
quick detection of a difference of one between the exponents of the two input floating 
point data elements. For example, with reference to the embodiment of Figure 2, the 
20 delay for an 8-bit implementation will be the delay of two XOR gates followed by an 8- 
input AND fimction (for example logic equivalent to an 8-input AND gate). This 
computation can be performed in the first pipeline stage Nl to enable the shift and select 
signals to be generated during that first pipeline stage. 

Returning to Figure 1, it can be seen that the outputs from the shift logic 35 and 
25 the shift logic 45 are latched in the registers 55, 60, respectively, and accordingly these 
registers will store the 24-bit significands of the input data elements A and B, shifted one 
place to the right as appropriate. 

In pipeline stage N2, it is required to determine the absolute difference between 
these stored significand values (the absolute difference being the magnitude of the 
30 difference between the two data elements, expressed as a positive value). One known 
approach for performing such an absolute difference computation is to use an end around 
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carry adder such as to iUu^ in 7. As shown i„ Figure 7. si^flcand 

fcm dau elemen. B is inverted prior ,o inpu, ,o ttre end around cany adder 600, wifl, tt,e 
s.gmficand of date =l.n«n. A being inpu, witou, inversion. Tt^ cany ou, ftom fte 
adder 600 is „u,ed via patt, 610 as a cany in ,o fte adder. 11,= output fiom flre end 
aromKi cany adder is routed via path 620 to one inpu, of the muWptexer 630, and is also 

routed viapath 625 Where i. is invenedprior to inputto the otter inpu, of themultiplexer 

630. 

If the significand of A is larger than the significand of B, then this wiU be 
indicated by a logic zero value in the most significant bit position of the output from the 
end around cany adder 600, and hence this most significant bit can be routed over path 
615 to control the output fiom the multiplexer 630. Similarly, if the significand of A is 
less than the significand of B. then the output from the end around cany adder will be 
negative (as indicated by a logic one value in the most significant bit position), and 
negation of the result is required in order to produce the absolute difference value Hiis 
can be achieved by using the most significant bit of the output from the end around cany 
adder to select as the output from the multiplexer 630 the signal r^eived at the second 
input of that multiplexer (i.e. an inverted version of the output from the end around cany 
adder). 

However, such an end around cany adder is relatively slow when compami with 
a nonnal adder, and this speed problem is compounded by the fact that the selection of 
the output or the inverted version of the output can only be made once the most 
sigmficant bit of the output from the end around cany adder is known. 

As cycle times reduce, it is envisaged that there will be msufficient time in the 
pipeline stage N2 for the use of such an end around cany adder to compute the absolute 

1 ^m. 



25 difference 



30 



As an alternative to using such an end around cany adder, an approach can be 
used where a detemiination is made as to which of the first and second data elements is 
the larger, with the ordering of the significand values then being swapped if required 
pnor to mput to a nonnal adder. This approach can ensure that when the nonnal adder is 
used to subtract one significand from the other, the significand of the smaller data 
element rs the one that is subtracted from the other significand. thereby ensuring that the 
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output from the adder is a positive value. However, as cycle times decrease, it is 
envisaged that there will be insufficient time in pipeline stage N2 to allow such swapping 
of the significand values to take place prior to input to the adder. 

In accordance with the embodiment illustrated in Figure 1, absolute difference 
5 logic is provided in pipeline stage 2 consisting of the inverter 70, the adder 80, the 
inverter 85 and the multiplexer 90. The adder 80 and the multiplexer 90 receive a signal 
stored in the carry register 65, this signal being generated by logic 50 provided in pipeline 
stage NL In the example illustrated in Figure 1, this logic 50 is arranged to receive the 
exponent and fraction portions from both input data elements A and B, and to detect 

10 which of the data elements is the largest. It will be appreciated by those skilled in the art 
that the logic 50 can be arranged in a variety of ways. However, in one embodiment, the 
logic 50 is arranged to perform a non-redundant subtract operation on its two input 
values, with a comparison result being output for storage in the carry register 65 which 
comprises a carry out result of the non-redundant subtract operation. In particular, the 

15 comparison result is set to a logic one value if the data element A is larger than or equal 
to the data element B, and is set to a logic zero value if the data element B is larger than 
the data element A. 

Before discussing the operation of the absolute difference logic employed in 
pipeline stage N2, the following discussion is provided to indicate why the value of the 
20 cany signal stored in the carry register 65 can be arranged to ensure that the absolute 
difference logic produces a positive result without the need to provide logic for 
selectively swapping the ordering of the significand values before they are input to the 
absolute difference logic. 

A two's complement adder produces a difference by inverting the minuend, and 
25 then adding it to the subtrahend with a carry-in of one. This works because for two's 
complement numbers, A-B = A+B+1. In the present context, the significand values 
are unsigned values that can be treated as two's complement numbers. 

The technique of the present embodiment is to manipulate the carry-in to the 
adder based on the magnitude comparison done in the preceding cycle by logic 50. 
30 Suppose that A > B, In this case, A - B is positive, so we set the carry-in to one and 
compute A + + 1. 
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Now suppose A < B. In this case, A - B is negative, and in order to easily 
compute the absolute value, we set carry-in to zero and compute A +^ , and then invert 
the sum to get |A - B|. 

The reason this works is that, for two's complement numbers, -X = 1, and so 
^ = - X - 1, which means that X = ^ = -X-l. This means that if we compute -X-1, 
then we can get X with a simple inversion. With respect to our original problem, -X = A 
-B,and-X-1 =A-B- 1 =A+ 5. 

Accordingly, returning to Figure 1, it can be seen that the output &om register 60 
is inverted by inverter 70 and then provided as one of the irputs to the adder 80. The 
other input of the adder logic 80 is the significand value stored in the register 55. If the 
carry value in register 65 is set to one, indicating that data element A is larger than data 
element B, then a carry-in of one will be fed to the adder 80, and the output from the 
adder 80 will correctly identify the absolute difference value. Accordingly, the same 
logic one value from the cany register 65 can be used to cause the multiplexer 90 to 
output the output from the adder 80 directly for storage in the sum register 100. 
However, if the value in the carry register 65 is a logic zero value, then the adder 80 will 
receive a logic zero value as a carry-in, and the output from the adder 80 will need to be 
inverted in order to produce the absolute difference value. This is achieved by using the 
carry value from the cany register 65 to drive the multiplexer 90, which in this scenario 
will cause the multiplexer to output to the sum register 100 the inverted input received 
via inverter 85. 

Accordingly, it can be seen that through the use of the absolute difference logic 
iUusfrated in pipeline stage N2 of Figure 1, the absolute difference value can be 
generated and stored in the sum register 100 without the need to provide any means for 
swapping the ordering of the inputs to the adder 80. This provides a significant 
performance improvement, which ensures that the absolute difference can be calculated 
in a single pipeline stage N2. 

As mentioned earlier, when the input floating point data elements require at most 
a 1-bit alignment, then it is possible that when perfomiing an unlike-signed addition 
operation within the near path logic, massive cancellation may occur. This means that 
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when logically subtracting one significand value from the other, the result may have a 
significant number of leading zeros. The presence of such leading zeros is detected by 
the leading zero adjust detector 75 which is arranged to receive the output from register 
55 and the inverted version of the output from register 60 as produced by inverter 70. 
The leading zero adjust detector 75 is constmcted in a standard manner, and produces a 
5-bit output signal identifying the number of leading zeros predicted to exist in the sum 
stored in the register 100, this value being stored in the register 95. Normalisation logic 
105 is then provided in pipeline stage N3 for normalising the value stored in the register 
100 based on the LZA value output from register 95. 

As will be appreciated by those skilled in the art, since the leading zero adjust 
detector 75 is an anticipator of the number of leading zeros, it is possible that the 
adjustment performed by the normalisation logic 105 is out by 1 bit. Hence, once the 
normalised result has been produced by the normalisation logic 105, the output is 
evaluated to check whether the most significant bit is a logic one value. If it is, then no 
fiuther adjustment is required, whereas if the most significant bit is a logic zero value, 
then a further 1-bit adjustment is performed within the 1-bit adjustment logic 110, 
whereafter the result is stored in the register 1 15. 

The result in the register 1 15 is the result of the unlike-signed addition operation 
performed by the near path logic. In pipeline stage N4, this is routed to a multiplexer 
120, which is also arranged to receive the result from the far path, such that one of the 
results can be selected for storing in the adder result register 125 as the result of the 
unlike-signed addition operation. 

As mentioned earlier, in preferred embodiments, if the select signal from either 
detector 30, 40 is set in pipeline stage Nl, then this indicates that the near path should be 
used to perform the operation rather than the far path, and accordingly a signal derived 
fix)m this select signal can be used to control the multiplex^ 1 20. 

In one embodiment of the present invention, each input operand to the data 
processing apparatus only includes a single floating point data element. However, in an 
alternative embodiment, a Single Instruction Multiple Data (SIMD) processing is 
performed by the data processing apparatus, in which event each input operand wiU 
comprise a plurality of floating point data elements. In such embodiments, it is 
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envisaged that the logic of Figure 1 will be replicated for each pair of first and second 
floating point data elements provided by the input operands. Accordingly, a pair of first 
and second floating point data elements extracted fi-om first and second operands will be 
stored in the registers 10, 20, with the logic of Figure 1 then being replicated for each pair 
of first and second floating point data elements. 

The absolute difference logic described earlier can also be applied in a data 
processing apparatus used to manipulate integer data elements, as is illustrated by Figure 
6. In the example of Figure 6, the first integer data element is stored in register 500 and 
the second integer data element is stored in register 505. In pipeline stage Nl, the logic 
510 is arranged to receive the first and second integer data elements and to determine 
which is the largest value, with a comparison result bemg output for storing in the carry 
value register 525 indicative of the result of that comparison. As will be discussed later, 
the logic 510 can actually be arranged to produce a plurality of comparison results in the 
event that SIMD processing is being performed, but for the time being we will assume 
that only a single integer data element is included within each input operand, and that 
accordingly the logic 510 produces a single comparison result (i.e. n = 1). 

The contents of the registers 500 and 505 pass directly through stage Nl, where 
they are stored in registers 515, 520, respectively. Thereafter, the outputs of these 
registers 515 and 520 are routed through the absolute difference logic 530, 535, 540, 545, 
which operates in an identical maimer to that discussed earUer with reference to the 
absolute difference logic 70, 80, 85, 90 of Figure 1. This results in an absolute difference 
result being output fi-om the multiplexer 545 for storing in the result register 550. 

&i one embodiment, it is envisaged that integer SIMD processing may be 
perfomied within the data processing apparatus, in which event a first operand will 
comprise a plurality of first integer data elements and a second operand will comprise a 
plurality of second integer data elements. In such embodiments, the entire operands are 
stored in the registers 500, 505, with the logic 510 being operable to receive the first and 
second operands and to produce, for each pair of first and second integer data elements 
provided by the first and second operands, an associated comparison result for storing in 
the cany value register 525. Accordingly, if, for the sake of example, four integer data 
elements are included within each input operand, a 4-bit value will be output fi-om the 
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logic 5 1 0 for Storing in fte carry value raster 525. In Are genc^ .en«, a. n-bi. value is 
output fcnr arc logic 510 for storing in the cany value register 525, whete n is «,ual to 

the number of integer data elements contained within each input operand. 

Considering now pipeline stage N2. the inverter 530 will invert the entirety „f .he 
second opera^l B prior to inputting that inverted version of the operand to the adder 535 
For each pair of first and second integer data elements, me adder 535 will then add the 
associated inverted data element from the second operand to the conesponding data 
element fiom the to operand and .o the associated comparison tesutt received fix,m the 
cany value register 525 in order to pn>duce an associated intermediate result Ms will 
result m fl,e „u,,ut trom the adder 535 containing a sequence of n intennediate 
wrth ans sequence being inverted by inverter 540 ,„ ceate an inverted seque«:e TlJ 
multiplexer 545 is then atranged. for each pair of first and second integer data elements, 
to output as the associated absolute diffetence either the associated intem,ediate result or 
the mverted vet^ion of the associated mtem.«Ba.e result dependent on the associated 
companson result received ftom the cany value register 525. 

Acconiingly. it can be seen that when peribmung integer arithmetic, a single 
block of absolute diffetence logic, and associated comparison logic 510 can be used to 
cafculate m paraUel the absolute difference for a plurality of pairs of integer data 
elements contained within the pair of input operands. 

Figure 8 is a blbck diagram illustrating the prtx=ess perfom,ed wh«, computing 
an absolute diffe^nce for fi., and second data elements when using the logic of eititer 
F^e I or Figures. For simpUcity, Figure 8 considers the non-SIMD app„,.ch Atstep 
700. the to and second data elements are compared and a comparison tesul. is p,«iuced 
m .cattve of which data element is the larger data element. At s..p 710. it is then 
detemnned whether the comparison result indicates fl«t the to data element is ^ter 
ta, or equal to the sec^u. dat. e.™,ent. If tins is the case, then the process ptoceeds to 
^ 720. where the da« element A is «Med to the inverted version of the data element B 
and added to a logic one value m oMer to genemte the absolute diffetence 

However, if at step 710 it is detemmted that the comparison tesult indicates that 
^^d data element is greater than the to data element, then ptocess proceeds to 
step 730. wh«e the data element A is added to ti,e inverted vetsion of the data element B 
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to generate an intermediate result. Thereafter, at step 740, the intermediate result is 
inverted in order to generate the absolute difference. 

By using the absolute difference logic discussed above with reference to Figures 
1 and 6, the generation of the absolute diflFerence can be produced in a particularly 
5 efficient manner. In particular, the critical path for the implementation of the absolute 
difference logic is the path involving a subtraction, inversion, and then the driving a two- 
input multiplexer. This provides a significantly faster implementation than the known 
prior art techniques. 

Although a particular embodunent of the invention has been described herein, it 
1 0 will be apparent that the invention is not limited thereto, and that many modifications and 
additions may be made within the scope of the invention. For example, various 
combinations of the features of the following dependent claims could be made with the 
features of the independent claims without departing &om the scope of the present 
invention. 



